Building a FIFA fantasy team using unsupervised learning techniques

Jeremy Brezovan

October 2019

Why soccer?

  • Increase in popularity here in the US
  • Learn more about the game, teams, players
  • Thanks, bestie. (He says: #sorrynotsorry)

Why fantasy leagues?

  • They're a growing industry
  • "Game of skill" vs. "game of chance"
  • In the US, NFL football is the primary focus. Baseball has also been historically popular.

  • Fantasy sports are also catching on in other countries. The fantasy sports market is growing in Europe, where some US-based companies chose to expand when stateside legislation threatened their domestic growth.

  • Fantasy sports are often categorized as a "game of skill", and as such tend to be permitted in areas that have gambling laws forbidding open betting on sports.

  • Daily Fantasy Sports have seen the largest jump, with players spending \$5 to play in 2012, but up to \$257 per year by 2015, according to the Wikipedia article about fantasy sports.

About the dataset

  • Scraped from sofifa.com
  • Original script courtesy of this Kaggle page
  • Heavily modified: 1) Target website structure changed; 2) Original script did not report status/progress; 3) Made it at least somewhat restartable.

  • Player stats are updated every two weeks on average

Data cleanup

  • Unnecessary variables for this analysis
  • Dropping "legacy" players, players with no club, players on a 'reserve' list (injury or suspension)
  • Converting height and salary values to decimal
  • Encoding categorical variables where it's sensible to do so
  • Filling NaNs and nulls

Let's use clustering to group similar players together...

  • Can unsupervised learning techniques help me choose the best position for a player?
  • Don't feed the model players' overall rankings, or per-position scores

Player attributes

  • Ex: Crossing, Dribbling, Preferred Foot, Body Type, Strength, Stamina
  • More subjective qualities: Vision, Composure, Marking
  • 42 overall attributes; 35 continuous
In [297]:
all_attributes = ['Body Type', 'Preferred Foot', 'International Reputation',
                  'Weak Foot', 'Skill Moves', 'Crossing', 'Finishing',
                  'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling',
                  'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 
                  'Acceleration', 'SprintSpeed', 'Agility', 'Reactions', 
                  'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength', 
                  'LongShots', 'Aggression', 'Interceptions', 'Positioning',
                  'Vision', 'Penalties', 'Composure', 'DefensiveAwareness', 
                  'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 
                  'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes', 
                  'Attack Work Rate', 'Defence Work Rate']
In [303]:
pos_radial_plot(position_attribute_mean_df,position_attribute_max_df)

Many models oblige us to choose the number of clusters. I tried using the elbow method to find an optimal number of clusters for this data. Here's how that looks:

In [309]:
wcss = []
for i in range(1,8):
    kmeans = KMeans(i)
    kmeans.fit(df_for_clustering)
    wcss.append(kmeans.inertia_)
plt.figure(figsize=(10,5))
plt.plot(range(1,8),wcss)
plt.xticks(np.arange(1,8,step=1))
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
Out[309]:
Text(0, 0.5, 'WCSS')

Build a few models. What works?

Choosing a consistent number of clusters across models makes their output easier to compare. The elbow method suggests just two or three clusters.

Two clusters provides no insight--all of the models invariably choose goalkeepers vs. non-goalkeepers as the two clusters.

Three clusters are another possibility--a third cluster splits the non-goalkeeper positions into what is roughly the attacking positions and the defensive positions. (Midfielders are split between these two clusters.)

I also tried four clusters--a fourth cluster was not stable under either the K-Means or Gaussian Mixture models, but centered around the midfield positions.

K-Means was my first choice for a model, because of its speed and simplicity.

In [312]:
clust_pca = model_comparison(km,df,df_for_clustering,km_pred)

I tried HDBSCAN after reading about how it works.

In [314]:
clust_pca = model_comparison(hdb,df,df_for_clustering,hdb_pred)

Gaussian Mixture is a soft-clustering model.

Since the other two are hard-clustering models I thought I would give this a shot, and see how its output compares to the K-Means and HDBSCAN models.

In [316]:
pca_df = model_comparison(gm,df,df_for_clustering,gm_pred)

We can use silhouette analysis to show that three clusters was the best choice. For this analysis, the closer the score is to 1, the more points are far away from points in a neighboring cluster.

With an average score of > .9, the three-cluster option is by far the best.

In [317]:
silhouette_plot(pca_df)
For n_clusters = 2 The average silhouette_score is : 0.8365055996294737
For n_clusters = 3 The average silhouette_score is : 0.9216960818926211
For n_clusters = 4 The average silhouette_score is : 0.7554436156933947
For n_clusters = 5 The average silhouette_score is : 0.5867819481261668
For n_clusters = 6 The average silhouette_score is : 0.5412660598512237

Understanding model behavior--what did it do?

Goalkeepers ended up in their own cluster. There are some very specific attributes for goalkeepers, which I suspect is why it was so easy for the models to separate them from the rest of the pack.

For non-goalkeepers, most attributes are shared across positions, though some do favor a particular position. Those shared characteristics mean that the boundaries between clusters are less obvious.

In [319]:
cluster_pie(df,'Position')

What the model did is cluster players with similarly strong attributes. The desired attributes shift across positions on the pitch, meaning the cluster with the strongest forwards is not the same as the cluster with the strongest defensive players.

We can check the summary statistics for the ratings of players in each cluster, and whittle down our field by finding the cluster with the most highly-ranked players.

Distribution of player position scores by cluster

We need to identify the cluster containing the strongest players for each position.

  • Ideally: create a new measure of player strength based on attributes

    • This makes better use of the same data that the model saw
  • Time constraints: lean on player rankings in the existing dataset

    • Good way to confirm the model's work; however...
    • Model's work counts for less

In an attempt to identify the best cluster for a given position, I performed some serious number crunching on various statistical measures of the attributes in each cluster, including:

  • Finding the mean and max for each attribute of a position overall, and for each cluster, then comparing the differences
  • Finding the number of attributes for which this difference was >= 0
  • Finding the clusters with the highest sum and mean of the difference

And after doing that work, I realized that I just needed to look for the largest cluster for a given position--it consistently holds the highest-ranked players.

In [322]:
for position in all_positions:
    score_distribution(df.loc[(df['Position'] == position) & (df[position] > 0),[position,'cluster']],position)
In [324]:
best_cluster_per_position
Out[324]:
{'LS': 0,
 'ST': 0,
 'RS': 0,
 'LW': 0,
 'LF': 0,
 'CF': 0,
 'RF': 0,
 'RW': 0,
 'LAM': 0,
 'CAM': 0,
 'RAM': 0,
 'LM': 0,
 'LCM': 0,
 'CM': 0,
 'RCM': 2,
 'RM': 0,
 'LWB': 2,
 'LDM': 2,
 'CDM': 2,
 'RDM': 2,
 'RWB': 2,
 'LB': 2,
 'LCB': 2,
 'CB': 2,
 'RCB': 2,
 'RB': 2,
 'GK': 1}

Satisfied with the models' clustering on attributes? Let's use it to help us choose the best players.

Once we've identified the cluster with the best players for a position, we can compare the players' strengths and salary ranges to determine where our money will be best spent.

In [325]:
# These Bokeh scatterplots don't display properly in the slideshow. :(
for position in all_positions:
    score_wage(df.loc[(df[position] > 85) & (df['Wage'] > 0)],position)

I'm aiming for a 4-3-3 formation for my fantasy team (4 defenders, 3 midfielders, 3 forwards)--this is one of the most common formations.

We have data for a lot more than the ten positions this formation requires, so let's group the position rankings we have in ways that make sense.

In [326]:
starting_players = {'Left Forward': {'positions': ['LF','LS','LW'], 
                                     'affinity': 'L',},
                    'Center Forward': {'positions': ['CF','ST',], 
                                       'affinity': 'C',
                                       'wage%': 1.5,},
                    'Right Forward': {'positions': ['RF','RS','RW'], 
                                      'affinity': 'R',},
                    'Left Midfield': {'positions': ['LM','LAM','CAM','LCM',], 
                                      'affinity': 'L',},
                    'Center Midfield': {'positions': ['CM','CAM','LCM','RCM'], 
                                        'affinity': 'C',},
                    'Right Midfield': {'positions': ['RM','RAM','CAM','RCM',], 
                                       'affinity': 'R',},
                    'Left Back': {'positions': ['LB','LWB','LDM','CDM',], 
                                  'affinity': 'L',},
                    'Right Back': {'positions': ['RB','RWB','RDM','CDM',], 
                                   'affinity': 'R',},  
                    'Left-Center Back': {'positions': ['LCB','CB','LDM'], 
                                         'affinity': 'LC',},
                    'Right-Center Back': {'positions': ['RCB','CB','RDM'], 
                                          'affinity': 'RC',},
                    'Goalkeeper': {'positions': ['GK'], 
                                   'affinity': 'GK',
                                   'wage%': 1.5,},
                   }

Sanity check: make sure the positions I'm combining are members of the same cluster! If not, there will be problems...

In [327]:
check_position(starting_players,best_cluster_per_position)
Left Forward position list: ['LF', 'LS', 'LW']
Center Forward position list: ['CF', 'ST']
Right Forward position list: ['RF', 'RS', 'RW']
Left Midfield position list: ['LM', 'LAM', 'CAM', 'LCM']
Center Midfield position list: ['CM', 'CAM', 'LCM', 'RCM']
	Modified positions for Center Midfield! Was: ['CM', 'CAM', 'LCM', 'RCM']; now: ['CM', 'CAM', 'LCM']
Right Midfield position list: ['RM', 'RAM', 'CAM', 'RCM']
	Modified positions for Right Midfield! Was: ['RM', 'RAM', 'CAM', 'RCM']; now: ['RM', 'RAM', 'CAM']
Left Back position list: ['LB', 'LWB', 'LDM', 'CDM']
Right Back position list: ['RB', 'RWB', 'RDM', 'CDM']
Left-Center Back position list: ['LCB', 'CB', 'LDM']
Right-Center Back position list: ['RCB', 'CB', 'RDM']
Goalkeeper position list: ['GK']
Positions are grouped correctly based on clusters

Let's choose the best player for each position in our formation!

This function grew iteratively, with each major iteration looking something like:

  1. Give me the best player for this position--money is no object.
  2. Okay, trading champagne for soda: give me the best player for the mean salary (or less!) for a given position.
  3. It would make a lot of sense for a player who typically plays left/center/right to be shown some kind of additional consideration for a position on that side of the pitch. I called this an "affinity", and award a small bonus in the calculations when the player's affinity matches the position.
    • Example: Mo Salah typically plays Right Wing, but ranks highly in other forward positions. I don't want him to end up as a Left Forward or Center Forward just because those positions were calculated before Right Forward.)
  4. Trading soda for...better soda? I want the players' ratings to count for more. If they're really good and not too much more expensive, let's spend a little bit more, and hire the better player.
  5. On that note, let's also institute a wage floor, so really cheap players don't get the nod when higher-ranked players are both available, and still within our budget.
In [328]:
df = best_for_position(df,starting_players,debug=True)
Starter: Left Forward
               Name    Wage Position  affinity_bonus  Left Forward  rankweight
0           L. Sané  195000       LW             1.5     85.333333    4.779834
118       P. Dybala  215000       CF             1.0     86.333333    2.992934
152     R. Sterling  255000       LW             1.5     86.333333    3.785181
155       Neymar Jr  290000       LW             1.5     90.666667    3.855105
203        M. Salah  240000       RW             1.0     89.666667    3.003875
206         H. Kane  220000       ST             1.0     87.000000    2.993195
244         S. Mané  220000       LW             1.5     89.000000    4.806607
245         G. Bale  250000       RM             1.0     87.000000    2.634012
249  R. Lewandowski  235000       LS             1.5     87.333333    4.251709
267   P. Aubameyang  205000       LM             1.5     86.333333    4.708395
472       E. Cavani  195000       ST             1.0     85.666667    3.224044
539      K. Benzema  285000       CF             1.0     86.333333    2.257827
        Name       Club    Wage Position affinity  Left Forward
244  S. Mané  Liverpool  220000       LW        L          89.0
Starter: Center Forward
wage% is set for Center Forward: 1.5
                  Name    Wage Position  affinity_bonus  Center Forward  \
96   Cristiano Ronaldo  405000       ST             1.5            93.5   
118          P. Dybala  215000       CF             1.5            85.5   
141       A. Griezmann  370000      CAM             1.5            89.0   
155          Neymar Jr  290000       LW             1.0            89.5   
203           M. Salah  240000       RW             1.0            89.0   
206            H. Kane  220000       ST             1.5            88.0   
209       K. De Bruyne  370000      RCM             1.0            87.5   
214             H. Son  185000       LS             1.0            87.5   
245            G. Bale  250000       RM             1.0            87.0   
249     R. Lewandowski  235000       LS             1.0            88.5   
267      P. Aubameyang  205000       LM             1.0            86.5   
338          L. Suárez  355000       ST             1.5            90.5   
444          S. Agüero  300000       ST             1.5            90.0   
472          E. Cavani  195000       ST             1.5            87.0   
539         K. Benzema  285000       CF             1.5            86.5   

     rankweight  
96     3.027409  
118    4.360649  
141    2.857982  
155    2.472129  
203    2.937371  
206    4.646400  
209    1.810600  
214    3.621199  
245    2.634012  
249    2.949592  
267    3.157145  
338    3.131905  
444    3.645000  
472    5.065408  
539    3.406393  
          Name                 Club    Wage Position affinity  Center Forward
472  E. Cavani  Paris Saint-Germain  195000       ST        C            87.0
Starter: Right Forward
                  Name    Wage Position  affinity_bonus  Right Forward  \
0              L. Sané  195000       LW             1.0      85.333333   
96   Cristiano Ronaldo  405000       ST             1.0      93.000000   
118          P. Dybala  215000       CF             1.0      86.333333   
141       A. Griezmann  370000      CAM             1.0      89.333333   
152        R. Sterling  255000       LW             1.0      86.333333   
155          Neymar Jr  290000       LW             1.0      90.666667   
203           M. Salah  240000       RW             1.5      89.666667   
206            H. Kane  220000       ST             1.0      87.000000   
209       K. De Bruyne  370000      RCM             1.0      88.333333   
245            G. Bale  250000       RM             1.5      87.000000   
249     R. Lewandowski  235000       LS             1.0      87.333333   
267      P. Aubameyang  205000       LM             1.0      86.333333   
338          L. Suárez  355000       ST             1.0      89.666667   
444          S. Agüero  300000       ST             1.0      89.333333   
539         K. Benzema  285000       CF             1.0      86.333333   

     rankweight  
0      3.186556  
96     1.986067  
118    2.992934  
141    1.926810  
152    2.523454  
155    2.570070  
203    4.505812  
206    2.993195  
209    1.862825  
245    3.951018  
249    2.834472  
267    3.138930  
338    2.030789  
444    2.376399  
539    2.257827  
         Name       Club    Wage Position affinity  Right Forward
203  M. Salah  Liverpool  240000       RW        R      89.666667
Starter: Left Midfield
                  Name    Wage Position  affinity_bonus  Left Midfield  \
96   Cristiano Ronaldo  405000       ST             1.0          89.25   
118          P. Dybala  215000       CF             1.0          87.25   
141       A. Griezmann  370000      CAM             1.0          88.75   
152        R. Sterling  255000       LW             1.5          86.25   
155          Neymar Jr  290000       LW             1.5          90.75   
158           P. Pogba  250000      RDM             1.0          86.75   
209       K. De Bruyne  370000      RCM             1.0          90.75   
214             H. Son  185000       LS             1.5          86.25   
224         C. Eriksen  205000      CAM             1.0          88.50   
250           Coutinho  175000      CAM             1.0          86.50   
309               Isco  245000       LW             1.5          85.50   
316            M. Reus  170000       ST             1.0          87.25   
338          L. Suárez  355000       ST             1.0          87.25   
355    Roberto Firmino  170000       CF             1.0          86.25   
444          S. Agüero  300000       ST             1.0          85.50   
446             Thiago  180000       CM             1.0          86.50   
450          L. Modrić  340000      RCM             1.0          89.00   
495        David Silva  265000      LCM             1.0          87.00   

     rankweight  
96     1.755374  
118    3.089284  
141    1.889311  
152    3.774230  
155    3.865745  
158    2.611370  
209    2.019939  
214    5.202317  
224    3.381240  
250    3.698369  
309    3.826692  
316    3.907036  
338    1.870975  
355    3.774230  
444    2.083421  
446    3.595637  
450    2.073438  
495    2.484917  
       Name               Club    Wage Position affinity  Left Midfield
214  H. Son  Tottenham Hotspur  185000       LS        L          86.25
Starter: Center Midfield
                  Name    Wage Position  affinity_bonus  Center Midfield  \
96   Cristiano Ronaldo  405000       ST             1.5        86.333333   
141       A. Griezmann  370000      CAM             1.5        87.333333   
155          Neymar Jr  290000       LW             1.0        87.666667   
158           P. Pogba  250000      RDM             1.0        87.000000   
209       K. De Bruyne  370000      RCM             1.0        90.333333   
224         C. Eriksen  205000      CAM             1.5        88.333333   
250           Coutinho  175000      CAM             1.5        85.333333   
338          L. Suárez  355000       ST             1.5        85.666667   
355    Roberto Firmino  170000       CF             1.5        85.666667   
446             Thiago  180000       CM             1.5        87.000000   
450          L. Modrić  340000      RCM             1.0        89.666667   
451          M. Pjanić  180000      CDM             1.5        86.000000   
495        David Silva  265000      LCM             1.0        86.666667   
519           T. Kroos  330000      LCM             1.0        86.666667   
777         I. Rakitić  245000       CM             1.5        85.333333   
878        I. Gündoğan  180000       CM             1.5        85.666667   

     rankweight  
96     2.383262  
141    2.700410  
155    2.323301  
158    2.634012  
209    1.992243  
224    5.043259  
250    5.326100  
338    2.656431  
355    5.547253  
446    5.487525  
450    2.120382  
451    5.300467  
495    2.456464  
519    1.972615  
777    3.804357  
878    5.239072  
                Name       Club    Wage Position affinity  Center Midfield
355  Roberto Firmino  Liverpool  170000       CF        C        85.666667
Starter: Right Midfield
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:37: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:39: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:49: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
              Name    Wage Position  affinity_bonus  Right Midfield  \
39       K. Mbappé  155000       RM             1.5       89.000000   
100     O. Dembélé  195000       RW             1.5       85.333333   
118      P. Dybala  215000       CF             1.0       88.666667   
141   A. Griezmann  370000      CAM             1.0       89.666667   
152    R. Sterling  255000       LW             1.0       88.000000   
155      Neymar Jr  290000       LW             1.0       92.666667   
158       P. Pogba  250000      RDM             1.5       86.666667   
206        H. Kane  220000       ST             1.0       85.666667   
209   K. De Bruyne  370000      RCM             1.0       91.000000   
224     C. Eriksen  205000      CAM             1.0       88.666667   
245        G. Bale  250000       RM             1.5       85.333333   
250       Coutinho  175000      CAM             1.0       87.333333   
309           Isco  245000       LW             1.0       86.000000   
316        M. Reus  170000       ST             1.0       88.666667   
338      L. Suárez  355000       ST             1.0       88.333333   
404    A. Di María  150000       RM             1.5       86.666667   
444      S. Agüero  300000       ST             1.0       87.333333   
446         Thiago  180000       CM             1.0       86.333333   
450      L. Modrić  340000      RCM             1.0       88.666667   
495    David Silva  265000      LCM             1.0       87.333333   
539     K. Benzema  285000       CF             1.0       86.333333   
2485          Xavi  150000       CM             1.0       86.666667   

      rankweight  
39      6.822281  
100     4.779834  
118     3.242222  
141     1.948459  
152     2.672439  
155     2.743927  
158     3.905778  
206     2.857676  
209     2.036678  
224     3.400379  
245     3.728270  
250     3.806292  
309     2.596147  
316     4.100457  
338     1.941536  
404     6.509630  
444     2.220337  
446     3.574893  
450     2.050228  
495     2.513589  
539     2.257827  
2485    4.339753  
         Name                 Club    Wage Position affinity  Right Midfield
39  K. Mbappé  Paris Saint-Germain  155000       RM        R            89.0
Starter: Left Back
              Name    Wage Position  affinity_bonus  Left Back  rankweight
275       N. Kanté  235000      LDM             1.5      88.75    4.461989
492   Sergio Ramos  300000      RCB             1.0      85.50    2.083421
583       A. Vidal  205000      LCM             1.0      85.75    3.075732
692     Jordi Alba  240000       LB             1.5      85.50    3.906415
725       Carvajal  205000       RB             1.0      85.50    3.048909
3433       P. Lahm  140000       RB             1.0      87.75    4.826289
         Name               Club    Wage Position affinity  Left Back
3433  P. Lahm  FC Bayern München  140000       RB        R      87.75
Starter: Right Back
             Name    Wage Position  affinity_bonus  Right Back  rankweight
275      N. Kanté  235000      LDM             1.0       88.75    2.974659
492  Sergio Ramos  300000      RCB             1.0       85.50    2.083421
583      A. Vidal  205000      LCM             1.0       85.75    3.075732
692    Jordi Alba  240000       LB             1.0       85.50    2.604277
725      Carvajal  205000       RB             1.5       85.50    4.573364
         Name         Club    Wage Position affinity  Right Back
725  Carvajal  Real Madrid  205000       RB        R        85.5
Starter: Left-Center Back
                 Name    Wage Position  affinity_bonus  Left-Center Back  \
166       V. van Dijk  200000      LCB             1.5         88.666667   
228        A. Laporte  195000       CB             1.0         86.000000   
238         S. Umtiti  210000      LCB             1.5         85.666667   
275          N. Kanté  235000      LDM             1.0         87.333333   
305      K. Koulibaly  150000      LCB             1.5         86.666667   
365       Fernandinho  200000      LCB             1.5         85.666667   
492      Sergio Ramos  300000      RCB             1.0         89.000000   
576          Casemiro  240000      CDM             1.0         87.000000   
583          A. Vidal  205000      LCM             1.5         87.000000   
616   Sergio Busquets  300000      CDM             1.0         86.333333   
617   T. Alderweireld  155000      RCB             1.0         86.666667   
783      G. Chiellini  215000      LCB             1.5         86.000000   
815          D. Godín  135000      RCB             1.0         87.000000   
853             Piqué  285000      RCB             1.0         87.333333   
869      Thiago Silva  135000      LCB             1.5         86.333333   
894     J. Vertonghen  155000      LCB             1.5         86.333333   
1055       L. Bonucci  160000      RCB             1.0         86.333333   

      rankweight  
166     5.228082  
228     3.261826  
238     4.490633  
275     2.834472  
305     6.509630  
365     4.715165  
492     2.349897  
576     2.743762  
583     4.818315  
616     2.144936  
617     4.199761  
783     4.437600  
815     4.877800  
853     2.337197  
869     7.149786  
894     6.227233  
1055    4.021754  
             Name                 Club    Wage Position affinity  \
869  Thiago Silva  Paris Saint-Germain  135000      LCB       LC   

     Left-Center Back  
869         86.333333  
Starter: Right-Center Back
                 Name    Wage Position  affinity_bonus  Right-Center Back  \
166       V. van Dijk  200000      LCB             1.0          88.666667   
228        A. Laporte  195000       CB             1.0          86.000000   
238         S. Umtiti  210000      LCB             1.0          85.666667   
275          N. Kanté  235000      LDM             1.0          87.333333   
305      K. Koulibaly  150000      LCB             1.0          86.666667   
365       Fernandinho  200000      LCB             1.0          85.666667   
492      Sergio Ramos  300000      RCB             1.5          89.000000   
576          Casemiro  240000      CDM             1.0          87.000000   
583          A. Vidal  205000      LCM             1.0          87.000000   
616   Sergio Busquets  300000      CDM             1.0          86.333333   
617   T. Alderweireld  155000      RCB             1.5          86.666667   
783      G. Chiellini  215000      LCB             1.0          86.000000   
815          D. Godín  135000      RCB             1.5          87.000000   
853             Piqué  285000      RCB             1.5          87.333333   
894     J. Vertonghen  155000      LCB             1.0          86.333333   
1055       L. Bonucci  160000      RCB             1.5          86.333333   

      rankweight  
166     3.485388  
228     3.261826  
238     2.993755  
275     2.834472  
305     4.339753  
365     3.143443  
492     3.524845  
576     2.743762  
583     3.212210  
616     2.144936  
617     6.299642  
783     2.958400  
815     7.316700  
853     3.505795  
894     4.151488  
1055    6.032632  
         Name   Club    Wage Position affinity  Right-Center Back
815  D. Godín  Inter  135000      RCB       RC               87.0
Starter: Goalkeeper
wage% is set for Goalkeeper: 1.5
             Name    Wage Position  affinity_bonus  Goalkeeper  rankweight
345   T. Courtois  235000       GK               1        88.0    2.899881
377       Alisson  155000       GK               1        89.0    4.548187
411      J. Oblak  125000       GK               1        91.0    6.028568
471       Ederson  185000       GK               1        88.0    3.683632
766      M. Neuer  155000       GK               1        88.0    4.396594
1142    H. Lloris  150000       GK               1        88.0    4.543147
         Name             Club    Wage Position affinity  Goalkeeper
411  J. Oblak  Atlético Madrid  125000       GK        G        91.0

And here are my starting 11!

In [330]:
# This renders poorly in the notebook...sorry.
from IPython.display import HTML
HTML(filename='starting_players.html')
Out[330]:

Left Forward

S. Mané

27, Senegal

Liverpool

Rank: 89.0

Center Forward

E. Cavani

32, Uruguay

Paris Saint-Germain

Rank: 87.0

Right Forward

M. Salah

27, Egypt

Liverpool

Rank: 89.7

Left Midfield

H. Son

26, Korea Republic

Tottenham Hotspur

Rank: 86.2

Center Midfield

Roberto Firmino

27, Brazil

Liverpool

Rank: 85.7

Right Midfield

K. Mbappé

20, France

Paris Saint-Germain

Rank: 89.0

Left Back

P. Lahm

32, Germany

FC Bayern München

Rank: 87.8

Left-Center Back

Thiago Silva

34, Brazil

Paris Saint-Germain

Rank: 86.3

Right-Center Back

D. Godín

33, Uruguay

Inter

Rank: 87.0

Right Back

Carvajal

27, Spain

Real Madrid

Rank: 85.5

Goalkeeper

J. Oblak

26, Slovenia

Atlético Madrid

Rank: 91.0

And let's see how much money we spent on the players we chose.

NOTE: I didn't finish implementing a salary cap.

  • The FIFA wage and salary data don't line up with the handful of players' salaries I spot-checked against Spotrac, which tracks contracts and salaries for players across a number of professional sports.

Example: Sadio Mané

  • FIFA: Wage of €220,000 (not sure about the pay period). Value of €62,000,000.
  • Spotrac: Weekly salary of £100,000. Annual salary £5.2 million.
  • Fantasy Premier League sets a cap of £100 million, so I used that value as a starting point. I ended up dropping it to 10 million, and that still seems high for the provided wages.
In [331]:
budget = 10000000
spent = 0
for starter in starting_players:
    player_wage = df.loc[df['ID'] == starting_players[starter]['ID'],'Wage'].to_string(index=False)
    player_wage = int(player_wage)
    print("{}:\t{}, {}".format(starter,starting_players[starter]['Name'],player_wage))
    spent += player_wage
print('Spent: {}; Remaining budget: {}'.format(spent,budget-spent))
Left Forward:	S. Mané, 220000
Center Forward:	E. Cavani, 195000
Right Forward:	M. Salah, 240000
Left Midfield:	H. Son, 185000
Center Midfield:	Roberto Firmino, 170000
Right Midfield:	K. Mbappé, 155000
Left Back:	P. Lahm, 140000
Right Back:	Carvajal, 205000
Left-Center Back:	Thiago Silva, 135000
Right-Center Back:	D. Godín, 135000
Goalkeeper:	J. Oblak, 125000
Spent: 1905000; Remaining budget: 8095000

Future work

  • Find and implement a reasonable salary cap.
    • I already made some modifications to prevent the function from simply choosing the absolute highest-ranked players, or the absolute cheapest--the next step will be to have it decide whether it can afford a better player for a position, or whether it must settle for a player who may be marginally worse, but who fits the budget.
  • Use a different site for salary/wage information, like Spotrac.
  • Set the budget in dollars, euros, UK pound sterling, etc. The default on the website is euros, but if we only want to look at MLS players, for instance, US dollars would make more sense.
  • FIFA 2020 data is updated every two weeks or so. Automated refresh of the dataset? Use an API instead of scraping?
  • Build a dashboard that allows a person to select a desired salary range and min/max player ratings, using those criteria to select the best available player.
  • Filters for league, which isn't in this dataset--focus on just MLS teams, or Premier League, Bundesliga, etc. Would have to pull in some league rosters or schedules to figure out which teams belong to which league(s).
  • Factor in players' actual performance in recent matches (current season, maybe also recent past seasons).

#YNWA